Extracting formal knowledge (ontologies) from natural language is a challenge that can benefit from a (semi-) formal linguistic representation of texts, at the semantic level. We propose to achieve such a representation by implementing the Universal Networking Language (UNL) specifications on top of RDF. Thus, the meaning of a statement in any language will be soundly expressed as a RDF-UNL graph that constitutes a middle ground between natural language and formal knowledge. In particular, we show that RDF-UNL graphs can support content extraction using generic SHACL rules and that reasoning on the extracted facts allows detecting incoherence in the original texts. This approach is experimented in the UNseL project that aims at extracting ontological representations from system requirements/specifications in order to check that they are consistent, complete and unambiguous. Our RDF-UNL implementation and all code for the working examples of this paper are publicly available under the CeCILL-B license at https://gitlab.tetras-libre.fr/unl/rdf-unl
Although classifiers/quantifiers (CQs) expressions appear frequently in everyday communications or written documents, they are described neither in classical bilingual paper dictionaries , nor in machine-readable dictionaries. The paper describes a CQs dictionary, edited from the corpus we have annotated, and its usage in the framework of French-Japanese machine translation (MT). CQs treatment in MT often causes problems of lexical ambiguity, polylexical phrase recognition difficulties in analysis and doubtful output in transfer-generation, in particular for distant languages pairs like French and Japanese. Our basic treatment of CQs is to annotate the corpus by UNL-UWs (Universal Networking Language-Universal words) 1 , and then to produce a bilingual or multilingual dictionary of CQs, based on synonymy through identity of UWs.
We present the architecture of the UNL-French deconverter, which "generates" from the UNL interlingua by first"localizing" the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in GETA's Ariane-G5 environment, supplemented by some UNL-specific tools. Online interaction can be used during deconversion to enhance output quality and is now used for development purposes. We show how interaction could be delayed and embedded in the postedition phase, which would then interact not directly with the output text, but indirectly with several components of the deconverter. Interacting online or offline can improve the quality not only of the utterance at hand, but also of the utterances processed later, as various preferences may be automatically changed to let the deconverter "learn".
We present a new software architecture for NLP systems made of heterogeneous components, and demonstrate an architectural prototype we have built at ATR in the context of Speech Translation.